Effect of Norovirus Inoculum Dose on Virus Kinetics, Shedding, and Symptoms

The effect of norovirus dose on outcomes such as virus shedding and symptoms after initial infection is not well understood. We performed a secondary analysis of a human challenge study by using Bayesian mixed-effects models. As the dose increased from 4.8 to 4,800 reverse transcription PCR units, the total amount of shed virus in feces increased from 4.5 × 1011 to 3.4 × 1012 genomic equivalent copies; in vomit, virus increased from 6.4 × 105 to 3.0 × 107 genomic equivalent copies. Onset time of viral shedding in feces decreased from 1.4 to 0.8 days, and time of peak viral shedding decreased from 2.3 to 1.5 days. Time to symptom onset decreased from 1.5 to 0.8 days. One type of symptom score increased. An increase in norovirus dose was associated with more rapid shedding and symptom onset and possibly increased severity. However, the effect on virus load and shedding was inconclusive.

The dataset of ftsshedding (fecal virus shedding) has several variables necessary for later time-series modeling (Appendix Table 3).
The dataset of css_10symptoms has ten kinds of symptoms collected from the trial. The dataset of css has a variable (css) for comprehensive symptom score (CSS). The dataset of mvs has one variable (mvs) for the Modified Vesikari score (MVS), and five variables (mvsc1 to mvsc5) for five score components, respectively.

Analysis files
The analyses folder contains all R code for data analysis. For different outcomes, there is always one script that performs the model fitting (<OUTCOME>.R), and one for generating figures (<OUTCOME>_plot.R). Readers should choose an outcome and run the modeling code first, and then execute the plotting code. These are the existing scripts for fitting the different outcomes: • allshedding_con.R fits models to the 3 total shedding outcomes, treating dose as continuous. The code fits models for both the 19 individuals shown in the main text, and the 20 individuals shown in the supplementary results.
• allshedding_cat.R fits models to the 3 total shedding outcomes, treating dose as categorical.
• tsshedding_con.R and tsshedding_cat.R fit the time-series data for fecal shedding to models, treating dose either continuous or categorical, respectively. allsymptoms_con.R and allsymptoms_cat.R fit the incubation period, CSS, and MVS outcomes for continuous or categorical dose, respectively.
Model results are saved in the /results/rds/, and figures are saved in the /results/plots/.

Reproducing results
To reproduce all our analyses, please follow these steps: 1. Double click norovirus.Rproj.
3. Open and run the R script allshedding_con.R, and then allshedding_con_plot.R. 4. Open and run the R script allsymptoms_con.R, and then allsymptoms_con_plot.R. 5. Open and run the R script allshedding_cat.R, and then allshedding_cat_plot.R. 6. Open and run the R script allsymptoms_cat.R, and then allsymptoms_cat_plot.R. 7. Open and run the R script tsshedding_con.R, and then tsshedding_con_plot.R. 8. Open and run the R script tsshedding_cat.R, and then tsshedding_cat_plot.R. 9. Open and run the RMD script manuscript_main_text.rmd, and supplementary_text.rmd.
In our Bayesian models, we implemented four chains with four CPU cores. Depending on the computer performance, models may need a few hours to run.

Analysis with rethinking package
All results presented in the main text and supplement are fit using Stan through the brms R package. As a check, we also implemented the same models with the rethinking package, which provides another R interface to the Stan fitting software.
The code for the models implemented with rethinking is in the folder 2_supplementary_materials/analyses/rethinking-analysis folder.
To run these scripts, the rethinking R package will need to be installed follwing the instructions here: https://github.com/rmcelreath/rethinking The folder contains three scripts which must be run in the following order: 1. Open and run the R script models.R. This script fits the models to the data and consequently will have a long run time.
2. Open and run the R script preds.R. This script processes the Stan model objects and generates the model predictions of interest. Some steps, particularly processing the time series models, may have a long run time.
3. Open and run the R script figs.R. This script generates replicate figures similar to those included in the manuscript and the supplement.
All of the results (model objects, predictions, and figures) will be saved in the folder 2_supplementary_materials/analyses/rethinking-results. If this folder and the necessary subfolders do not exist, they will be created for you the first time the script models.R is run.
Finally, we recommend restarting the R session between scripts to ensure that sufficient memory is available.

Session information
We provide our PC session information as below:

Analysis of virus shedding
The following sections provide additional details and further results related to our analyses of virus shedding.

Accounting for limits of detection in virus shedding
The study that produced the data used two different methods to determine virus concentration in feces (1). The first method was a quantitative real-time RT-PCR (qRT-PCR) method. This method was able to detect virus up to a limit of detection of 40 × 10 6 genomic equivalent copies (GEC). If a sample was below this level (negative qPCR reading), the sample was retested using an immunomagnetic capture (IMC) RT-PCR assay. IMC RT-PCR is more sensitive, with a lower limit of detection of 15 × 10 3 GEC, but only produces a qualitative positive or negative readout. We labeled the limit of 15 × 10 3 GEC as LOD1, and 40 × 10 6 as LOD2. Thus, virus samples have a numeric value if they are above LOD2, are labeled as positive if they are below LOD2 but above LOD1, and are labeled as negative if they are below LOD1.
We dealt with these detection limits as follows.
For the analyses involving the time-series data of fecal virus shedding in our Bayesian nonlinear mixed effects model, we used the built-in approach of brms to handle censored values and deal with the limits of detection (2). In the time series models, the missing data were considered as censors. For example, with two LODs, 1) If the data was below both LODs (-,-), the interval was between 1 and 15000 (which leads to 0 and log(15000) on the modeling scale).
2) If the data was (+,-), the interval was between 15000 and 4e7; 3) If the data was (+,+), instead of interval, the observed value was used. In brms, the likelihood function uses a cumulative distribution function for censored data (3).
For the simple analysis that does not use the time-series model, we substituted values. If a sample had a quantitative concentration above 40 × 10 6 GEC, we used the numeric value. If a sample was reported as positive (concentration between 40 × 10 6 GEC and 15 × 10 3 GEC), we used the geometric mean of those two values (≈ 7.75 × 10 5 GEC). Similarly, if a sample was recorded as negative (concentration below 15 × 10 3 GEC), we used the geometric mean of 15 × 10 3 GEC and 1 GEC (≈ 122 GEC). To explore the impact of these choices, we additional performed two sensitivity analyses. We set values below the LOD1 to 1, and values between LOD1 and LOD2 were set to either 15000 (the LOD1 value) or 4e7 (the LOD2 value), to explore two extremes. These additional analyses are shown further below in this supplement.
To compute the total amount of virus shed per shedding event, we multiplied the virus concentration with the weight of the shed feces (i.e., GEC/g × weight of feces). Finally, for each individual, we summed values for all shedding events.
Data for vomiting had similar limits of detection. These data were recorded as either a numeric value above 2,200 GEC, a positive readout below 2,200 GEC or a negative readout.
For the vomiting data, we did not have information on the limit of detection for the qualitative assay. Based on a comparison of the two methods (4), we made the assumption that the LOD for the qualitative assay was a factor of 10 lower, thus 220 GEC. We then again took the geometric mean of 2,200 GEC and 220 GEC for the positive values (≈ 696 GEC), and 220 GEC and 1 GEC for the negative values (≈ 15 GEC). All virus concentrations were then multiplied with total vomit volume (i.e., virus particles/ml × volume of vomit), then summed those for each individual. This included making the additional assumption that 1 g of vomit equated to 1 mL (5).

Modeling total virus shedding
We computed total virus shedding by multiplying virus concentration with sample weight (feces) or volume (vomit) for each shedding event, and summing all values for each individual.
We applied a mixed effects model that investigated associations between the outcomes (total shedding amount) and inoculum dose. For the analysis shown in the main text, we assumed a linear relationship between the log of the dose ( ) and the log of the outcome ( ).
We fitted the same model to each of the 3 outcomes. Those are 1) fecal shedding during shedding if the dose was zero, which is biologically not meaningful and makes assigning reasonable priors more difficult (6).) Priors are chosen based on what we know about the virus kinetics, and to ensure prior predictive simulations produce flexible but reasonable outcomes (6). We used Half-Cauchy distributions for priors of standard deviation, because Cauchy has fatter tails than normal distribution which will make priors less informative. Since vomit outcome values were lower than those for feces, we adjusted the prior for this model to have a mean around 20.
As part of our sensitivity analyses, presented below, we also explored a model which treats dose as categorical. The following lines in the model are adjusted accordingly: Linear model: = + Adaptive priors: ∼ Normal(0, ) Fixed priors: ∼ Half-Cauchy(0,2) ∼ Normal(25,5), In this notation, the parameter takes on discrete values based on the dose level of each individual, . Specifically, the 3 categories are low, medium, or high dose, corresponding to 4.8, 48, or 4800 RT-PCR units.

Modeling longitudinal virus concentration kinetics
To model the longitudinal time-series of virus concentration in feces, we used a previously developed equation that was shown to describe virus time-course in acute viral infections well (7). This equation provides a good empirical function to fit the increase, then decrease of viral load seen in many acute viral infections. The equation has four parameters and is given by: The outcome of interest is virus load as a function of time, ( ). The model parameters approximately represent the peak virus load ( ), the initial exponential growth rate ( ), the time of virus peak ( ) and the eventual rate of virus decline ( ). (The parameters only approximately map to those biologic quantities, see (7) for details.) Since all parameters in the model above need to be positive to achieve biologically reasonable trajectories for ( ), we rewrote the equation for our purpose with exponentiated parameters. That is, we assume that the deterministic portion of the virus load data (on a log scale, denoted below as , ), is described for each individual, , by Since there was a fair amount of variability in the virus load data, we modeled the outcome using a non-standardized Student-t distribution instead of a Normal distribution, which provides more robust estimates in the presence of strong variability (2,6). Specifically, we model the virus load data as Values for priors were chosen to be weakly informative, such that prior predictive simulations produced virus-load trajectories that made biologic sense, while still allowing for a wide variety of possible trajectories to be informed by fitting to the data. The peak level of virus concentration for norovirus in our data and previous studies is broadly in the range of 10 10 to 10 14 (1,8,9) (around 23-35 in log units). We approximately centered our prior around those values, while allowing for a broad standard deviation so that the data will dominate the posterior results. Similarly, the growth rate, , will have a value such that the peak is reached within the first several days following infection. A Normal distribution with a mean of 3 (in log units) can produce the necessary range of value. Virus is expected to peak a few days following infection.
Thus we chose and 1 to have normal distributions with a spread such that the range of values for exp( ) is centered around the first few days. The decay rate, , needs to allow for a decline of virus to undetectable levels that can be as fast as a week or longer than a month. A Normal distribution with log-transformed mean of −1 can produce such outcomes.
We performed prior predictive simulations to ensure that our priors led to biologically  As before, in this categorical analysis, is the dose category for individual , and is either low, medium or high.

Analysis of symptom outcomes
The following sections provide additional details and further results related to our analyses of infection symptoms.
We considered three different symptom-related outcomes, namely time to first symptom onset (incubation period) and two versions of scores that quantify overall infection severity.
Since individuals were allowed to leave the study center after 96 hours and illness is typically self-limiting lasting for only a few days (10), we calculated the scores based on records in the first 4 days.

Incubation period
Incubation period, i.e., the time between infection and onset of symptoms, was directly computed from the data as the time-span between reported time at which the infection challenge was administered, and the time at which the first symptom was reported.

Modified Vesikari score (MVS)
The modified Vesikari score (MVS) is a previously defined quantity that has been used in a modified form by several studies to measure norovirus severity (11)(12)(13)(14). The score has the seven components shown in Appendix Table 4. For our dataset, since volunteers were housed in a healthcare setting, the health care provider visit component was not applicable, and we thus removed it from the score calculation (13). Since we did not have information on treatment, we also dropped that component. This left us with a five-component score, which was computed for each individual following the rules shown in Appendix Table 4.
Appendix Table 5 shows the scores for all 20 volunteers in the study that were part of our analyses.

Comprehensive symptom score (CSS)
In addition to the modified Vesikari score, we defined and computed a comprehensive symptom score which took all recorded symptoms into account.
The study reported the following symptoms: body temperature, malaise, muscle aches, headache, nausea, chills, anorexia, cramps, unformed or liquid feces, and vomiting. Clinical symptoms (except feces and vomiting) were reported as none, mild, moderate, or severe, which we coded as a score of 0 to 3. For feces, we used a scoring of solid = 0, unformed = 1, and liquid = 2. Vomit was reported as absent or present and scored as 0 or 1.
Individuals had their symptoms recorded at different times and frequencies throughout the day. Thus, summing up the recorded scores would have introduced bias due to different recording frequencies. Thus, we instead determined the highest score per symptom for each individual per day, and summed those. This produced a daily total symptom score for each individual. We then summed those to obtain our comprehensive symptom score. For example, if one individual had daily total symptom score values of 5 (1st day), 10 (2nd day), 2 (3rd day), and 0 (4th day), the final total symptom scores would be 17.
Appendix Table 6 shows the daily and total comprehensive score values for the 20 individuals, Appendix Figure 1 shows the same information in graphical form, stratified by dose.

Modeling symptom outcomes
Using following models, we tested the impact of dose on each kind of symptom in the trial and two types of symptom scores.
The incubation period is positive and not too far from zero (as measured in days). Thus, to prevent any possible negative outcomes, we assumed it followed a log-normal distribution.
The rest of the model is similar to those implemented for the total shedding outcomes, with the full model is given below.

Model setup, outcomes and diagnostics
In our Bayesian models, we generally implemented four chains with ≈30k iterations with 25k as a warmup to obtain 5k posterior samples for final results. We assessed Rhat, mixing of chains, and posterior distributions to ensure convergence.
As described above, the models were implemented in R using the brms package. We report results based on predictions from the posterior samples of the models (3). The predictions are based on an assumption that a new individual, who theoretically similar to individuals in this trial but has not been enrolled previously, be given with a certain amount of virus. Then, the model posterior predictive distribution of the new individual was summarized by a mean and credible interval for an outcome.

Evaluation of prior and posterior distributions for model parameters
To ensure that our results were not overly influenced by the choice of prior distributions, we compared prior and posterior distributions.
The parameter's prior and posterior distributions showed that the choice of prior distribution had no significant impact on our results (Appendix Figure 2 to 4).

Results from additional analyses Fecal virus shedding during the full observation period
In the main text, we focus on the data collected during the time individuals were under direct observation, since this ensures complete collection of all feces. Here, we investigate the relation between dose and total fecal shedding for all samples, including those individuals were asked to collect after returning home. We did not find evidence for associations between dose and virus shedding.

Impact of changing assumption for LOD values
As described above, for the non time-series analysis shown in the main text, we set measurements that were below LOD2 but above LOD1 to the geometric mean of those 2 values, and values below LOD1 were set to the geometric mean of LOD1 and 1. To explore the impact of these choices, we tested a few different choices for those values. The next figures show that the overall observed pattern remains consistent, independent of the choices for the LOD values.

Vomit events
In each dose group, only a few individuals had vomiting events. Some of these individuals had multiple vomiting events. Appendix Figure 8 graphically displays the vomiting event data. The recorded vomiting events were not sufficient to allow for a time-series analysis.

Individual-level fits to fecal sheedding time series data
The longitudinal virus shedding data in feces and fitted model results are shown in Appendix Figure 9 for each individual.

Correlation between total virus shed in feces and virus load
We compared the fitted total virus load (area under the curve), to the observed total virus shed in feces to assess model performance. We found strong correlation, which support the robustness of our models. The following figures show the association between growth rate and symptom scores (MVS and CSS) for each individual. They are nonlinear curves because negative binomial distributions were used in models similarly to the symptom scores (Appendix Figure 4, main manuscript). The range of x-axis was based on data.

Treating dose as a categorical variable
The following figures repeat the analyses and results shown in the main text, but now with dose treated as categorical.

Assessing association of total virus shedding with dose
Assessing association of symptoms with dose Appendix Figure 18 shows data and model estimates for the different symptom outcomes for each dose category. Again, the patterns seen are similar to those shown in the main text. One difference is that the comprehensive score only increases for the highest dose group, while the two lower groups are similar.
Looking at 95% credible interval of each symptom's score (horizontal bars in Appendix Figure 19) by 3 dose groups, we again find the pattern reported above for the case of treating dose as continuous. Namely, symptoms associated with MVS do not show any pronounced variation with dose, while several symptoms that are part of the CSS but not the MVS do show variation with dose.

Analyses with inclusion of the very-low dose infected individual
As explained in the main text, the original study also administered a dose of 0.48 RT-PCR units. At that dose level, only a single challenged individual became infected. We removed this person for the analysis presented in the main text. However, we also decided to conduct a sensitivity analysis that re-computes all results shown in the main text, now with the additional individual included.

Assessing association of total virus shedding with dose
Comparison of this figure with the one shown in the main text shows overall similar results.

Modeling of virus concentration in feces
As Appendix Figures 21 -23 show, the findings remain essentially unchanged for these outcomes compared to what is shown in the main text.

Assessing association of symptoms with dose
As Appendix Figure 24 shows, the association between dose and symptom outcomes also remains essentially unchanged when including the additional individual in the analysis. We also again find a similar pattern between individual symptoms and dose as found previously (Appendix Figure 25). Comprehensive symptom score components (CSSC) css Comprehensive symptom score (CSS) mvs Modified Vesikari score (MVS) Appendix Figure 7. The association between dose and virus shedding in feces (first 4 days) when values below the LOD1 were set to 1, and values between LOD1 and LOD2 were set to 4e7 (the LOD2 value).